Visualizing Data

This lesson introduces three of the most popular Python libraries for data visualization: Pandas, Plotly, and Seaborn Each library offers unique capabilities for analyzing and presenting data. You will gain hands-on experience comparing these tools while developing skills to create insightful visualizations like bar charts, line charts, and scatterplots.

Data skills | concepts

  • Pandas
  • Plotly
  • Seaborn

Learning objectives

  1. Compare and contrast Pandas, Plotly, and Seaborn for visualizing data in Python.
  2. Formulate a data-driven question and outline the steps needed to filter, aggregate, and visualize data effectively.
  3. Create and customize bar charts to compare categorical data.
  4. Illustrate trends and patterns over time using line charts.
  5. Explore relationships between two variables through scatterplots.

This tutorial is designed to support workshops hosted by The Ohio State University Libraries Research Commons. It assumes you already have a basic understanding of Python, including how to iterate through lists and dictionaries to extract data using a for loop. To learn basic Python concepts visit the Python - Mastering the Basics tutorial.

PANDAS

Pandas is powerful Python library designed to help you organize, explore and analyze in tables using Python. Pandas can be used to generate summary statistics and build basic visualizations.

Pandas integrates with Matplotlib to generate simple plots using the .plot() method. The kind = parameter specifies the type of chart to create:

kind = **Chart Type
line line chart (default)
bar or barh vertical or horizontal bar chart
hist histogram
box boxplot
kde or density Kernel Denstity Estimation plot
area area plot
scatter scatterplot
hex hexagonal bin plots
pie pie charts

📊 Bar Chart

Let’s build a bar chart that highlights the average U.S. peak chart positions for albums by 2025 Rock & Roll Hall of Fame inductees to explore visualizing data with Pandas.1

The syntax for building a Basic Pandas Chart is:

DataFrame.plot(*args, **kwargs)

Step 1. Import libraries

Pandas works alongside matplotlib libraries to visualize data.

import pandas as pd
import matplotlib.pyplot as plt 

Step 2. Read in files

We’ll use the rock_n_roll_performers.csv table from the Wikipedia page on Rock and Roll Hall of Fame inductees to explore plotting with Pandas. The Performers category honors recording artists and bands who have had a significant and lasting impact on the development and legacy of rock and roll. We’ll also enhance our analysis by linking this dataset with rock_n_roll_studio_albums.csv which contains studio album information of many of the inductees.

.read_csv()

performers=pd.read_csv('data/rock_n_roll_performers.csv', encoding="utf-8")

# A UnicodeDecodeError occurs after asking Pandas to read in rock_n_roll_studio_albums. Co-pilot suggests trying a difference encoding, like latin1
studio_albums=pd.read_csv('data/rock_n_roll_studio_albums.csv', encoding='latin1')

Step 2. Merge datasets

After loading the performers and studio_albums tables using pd.read_csv, we can inspect the column headers using .columns.

performers.columns
Index(['index', 'year', 'image', 'name', 'inducted_members',
       'prior_nominations', 'induction_presenter', 'artist', 'image_url',
       'artist_url'],
      dtype='object')
studio_albums.columns
Index(['index', 'album_title', 'artist', 'certification_aria',
       'certification_aria_status', 'certification_aria_x',
       'certification_bmvi', 'certification_bmvi_status',
       'certification_bmvi_x', 'certification_bpi', 'certification_bpi_status',
       'certification_bpi_x', 'certification_mc', 'certification_mc_status',
       'certification_mc_x', 'certification_riaa', 'certification_riaa_status',
       'certification_riaa_x', 'certification_snep',
       'certification_snep_status', 'certification_snep_x', 'day',
       'format_4_track', 'format_8_track', 'format_blueray', 'format_box_set',
       'format_cassette', 'format_cd', 'format_digital_compact_cassette',
       'format_digital_download', 'format_dvd', 'format_lp',
       'format_mini_disc', 'format_picture_disc', 'format_reel',
       'format_streaming', 'format_vhs', 'month', 'peakAUS', 'peakAUT',
       'peakCAN', 'peakFRA', 'peakGER', 'peakIRE', 'peakITA', 'peakJPN',
       'peakNLD', 'peakNOR', 'peakNZ', 'peakSPA', 'peakSWE', 'peakSWI',
       'peakUK', 'peakUS', 'peakUS Country', 'peakUS R&B', 'Record label',
       'Release date', 'year'],
      dtype='object')

Both datasets share the columns artist and year, which could be used for merging. However, to avoid confusion after joining, we’ll first rename the header year in the performers dataset to year_inducted.

.rename()

performers=performers.rename(columns={'year':'year_inducted'})  
performers.columns
Index(['index', 'year_inducted', 'image', 'name', 'inducted_members',
       'prior_nominations', 'induction_presenter', 'artist', 'image_url',
       'artist_url'],
      dtype='object')

Then, we’ll merge the two datasets using the shared artist column.

performers_albums=pd.merge(performers, studio_albums, on='artist')
performers_albums.columns
Index(['index_x', 'year_inducted', 'image', 'name', 'inducted_members',
       'prior_nominations', 'induction_presenter', 'artist', 'image_url',
       'artist_url', 'index_y', 'album_title', 'certification_aria',
       'certification_aria_status', 'certification_aria_x',
       'certification_bmvi', 'certification_bmvi_status',
       'certification_bmvi_x', 'certification_bpi', 'certification_bpi_status',
       'certification_bpi_x', 'certification_mc', 'certification_mc_status',
       'certification_mc_x', 'certification_riaa', 'certification_riaa_status',
       'certification_riaa_x', 'certification_snep',
       'certification_snep_status', 'certification_snep_x', 'day',
       'format_4_track', 'format_8_track', 'format_blueray', 'format_box_set',
       'format_cassette', 'format_cd', 'format_digital_compact_cassette',
       'format_digital_download', 'format_dvd', 'format_lp',
       'format_mini_disc', 'format_picture_disc', 'format_reel',
       'format_streaming', 'format_vhs', 'month', 'peakAUS', 'peakAUT',
       'peakCAN', 'peakFRA', 'peakGER', 'peakIRE', 'peakITA', 'peakJPN',
       'peakNLD', 'peakNOR', 'peakNZ', 'peakSPA', 'peakSWE', 'peakSWI',
       'peakUK', 'peakUS', 'peakUS Country', 'peakUS R&B', 'Record label',
       'Release date', 'year'],
      dtype='object')

Step 3. Create and apply filters

Now that we’ve merged the inductee and album datasets, we can begin filtering the data to focus on specific trends or groups.

Before applying any filters, it’s important to confirm the data type of the year_inducted column in the performers_albums DataFrame. This ensures we can perform numerical comparisons or sorting without errors.

.dtypes

performers_albums['year_inducted'].dtypes
dtype('int64')

We can create and apply filter to isolate the 2025 inductees using the year_inducted field.

filter_variable=df[‘column’]==value

#First create the filter
_2025_inductees= performers_albums['year_inducted']==2025 

filtered_df=df[filter_variable]

#Then apply the filter
performers_albums_filtered=performers_albums[_2025_inductees]
performers_albums_filtered
index_x year_inducted image name inducted_members prior_nominations induction_presenter artist image_url artist_url ... peakSWI peakUK peakUS peakUS Country peakUS R&B Record label Release date year Formatted Release date Release year
4770 264 2025 NaN Bad Company[193] Boz Burrell, Simon Kirke, Mick Ralphs, and Pau... First nomination NaN Bad Company /wiki/File:Bad_Company_-_1976.jpg /wiki/Bad_Company ... NaN 3.0 1.0 NaN NaN Island, Swan Song 1974-05-24 1974 1974 1974
4771 264 2025 NaN Bad Company[193] Boz Burrell, Simon Kirke, Mick Ralphs, and Pau... First nomination NaN Bad Company /wiki/File:Bad_Company_-_1976.jpg /wiki/Bad_Company ... NaN 3.0 3.0 NaN NaN Island, Swan Song 1975-03-28 1975 1975 1975
4772 264 2025 NaN Bad Company[193] Boz Burrell, Simon Kirke, Mick Ralphs, and Pau... First nomination NaN Bad Company /wiki/File:Bad_Company_-_1976.jpg /wiki/Bad_Company ... NaN 4.0 5.0 NaN NaN Island, Swan Song 1976-01-30 1976 1976 1976
4773 264 2025 NaN Bad Company[193] Boz Burrell, Simon Kirke, Mick Ralphs, and Pau... First nomination NaN Bad Company /wiki/File:Bad_Company_-_1976.jpg /wiki/Bad_Company ... NaN 17.0 15.0 NaN NaN Island, Swan Song 1977-03-03 1977 1977 1977
4774 264 2025 NaN Bad Company[193] Boz Burrell, Simon Kirke, Mick Ralphs, and Pau... First nomination NaN Bad Company /wiki/File:Bad_Company_-_1976.jpg /wiki/Bad_Company ... NaN 10.0 3.0 NaN NaN Swan Song 1979-03-07 1979 1979 1979
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4846 270 2025 NaN The White Stripes[193] Jack White and Meg White.[197][198] 1 (2023) NaN The White Stripes /wiki/File:Jack_%26_Meg,_The_White_Stripes.jpg /wiki/The_White_Stripes ... NaN 137.0 NaN NaN NaN Sympathy for the Record Industry 2000-06-20 2000 2000 2000
4847 270 2025 NaN The White Stripes[193] Jack White and Meg White.[197][198] 1 (2023) NaN The White Stripes /wiki/File:Jack_%26_Meg,_The_White_Stripes.jpg /wiki/The_White_Stripes ... NaN 55.0 61.0 NaN NaN Sympathy for the Record Industry, Third Man Re... 2001-07-03 2001 2001 2001
4848 270 2025 NaN The White Stripes[193] Jack White and Meg White.[197][198] 1 (2023) NaN The White Stripes /wiki/File:Jack_%26_Meg,_The_White_Stripes.jpg /wiki/The_White_Stripes ... NaN 1.0 6.0 NaN NaN V2 2003-04-01 2003 2003 2003
4849 270 2025 NaN The White Stripes[193] Jack White and Meg White.[197][198] 1 (2023) NaN The White Stripes /wiki/File:Jack_%26_Meg,_The_White_Stripes.jpg /wiki/The_White_Stripes ... NaN 3.0 3.0 NaN NaN V2 2005-06-07 2005 2005 2005
4850 270 2025 NaN The White Stripes[193] Jack White and Meg White.[197][198] 1 (2023) NaN The White Stripes /wiki/File:Jack_%26_Meg,_The_White_Stripes.jpg /wiki/The_White_Stripes ... NaN 1.0 2.0 NaN NaN Third Man, Warner Bros. 2007-06-15 2007 2007 2007

81 rows × 70 columns

Step 4. Aggregate data

Pandas supports a variety of basic summary statistics through build-in methods:

Method Description
.count() number of observations
.sum() histogram
.mean() boxplot
.medium() density plots
.min() area plots
.max() scatterplots
mode() hexagonal bin plots
std() pie charts

.groupby()

To calculate statistics grouped by category—such as average chart positions by artist or year—we use the .groupby() method. This allows us to aggregate data based on one or more columns before applying summary functions.

performers_albums_filtered.groupby('artist')['peakUS'].mean()
artist
Bad Company          41.000000
Chubby Checker       57.333333
Cyndi Lauper         59.100000
Joe Cocker           62.571429
Outkast               4.833333
Soundgarden          31.000000
The White Stripes    18.000000
Name: peakUS, dtype: float64

Step 5. Plot

The last step to build our bar chart is to add the .plot(*args, **kwargs) method with relevant arguments and keyword arguments.

First use the .sort_values(ascending=False) method first to sort the bars in descending order. Then .plot with the keyword arguments:

  • kind = ‘bar’
  • xlabel = ’’ (removes the redundant label on the x-axis)
  • title = ‘Peak US Chart Position: Rock N Roll Hall of Fame Inductees’ (= newline)
performers_albums_filtered.groupby('artist')['peakUS'].mean().sort_values(ascending=False).plot(kind='bar', xlabel='', title='Average Peak US Chart Position: \n 2025 Rock N Roll Hall of Fame Inductees')

📈 Line chart

Line charts reveal trends over time and at minimum require a date field and a measure. To create a line chart with Pandas, set the kind = parameter to line.

Exercise 1

Create a line chart that shows the total number of albums released each year by all artists in the performers_albums dataset.
  1. Identify relevant columns and data types
  2. Aggregate data and plot

Step 1. Identify relevant columns and data types

Before creating a chart, the first step is to identify which columns are needed for the visualization. Once those columns are selected, we’ll check their data types to ensure they are suitable for analysis and plotting.

performers_albums.columns
Index(['index_x', 'year_inducted', 'image', 'name', 'inducted_members',
       'prior_nominations', 'induction_presenter', 'artist', 'image_url',
       'artist_url', 'index_y', 'album_title', 'certification_aria',
       'certification_aria_status', 'certification_aria_x',
       'certification_bmvi', 'certification_bmvi_status',
       'certification_bmvi_x', 'certification_bpi', 'certification_bpi_status',
       'certification_bpi_x', 'certification_mc', 'certification_mc_status',
       'certification_mc_x', 'certification_riaa', 'certification_riaa_status',
       'certification_riaa_x', 'certification_snep',
       'certification_snep_status', 'certification_snep_x', 'day',
       'format_4_track', 'format_8_track', 'format_blueray', 'format_box_set',
       'format_cassette', 'format_cd', 'format_digital_compact_cassette',
       'format_digital_download', 'format_dvd', 'format_lp',
       'format_mini_disc', 'format_picture_disc', 'format_reel',
       'format_streaming', 'format_vhs', 'month', 'peakAUS', 'peakAUT',
       'peakCAN', 'peakFRA', 'peakGER', 'peakIRE', 'peakITA', 'peakJPN',
       'peakNLD', 'peakNOR', 'peakNZ', 'peakSPA', 'peakSWE', 'peakSWI',
       'peakUK', 'peakUS', 'peakUS Country', 'peakUS R&B', 'Record label',
       'Release date', 'year'],
      dtype='object')
performers_albums['Release date'].dtypes
dtype('O')

In Pandas, dtype(‘O’) stands for object data type. This is a general-purpose type used when a column contains:

  • Strings (most common)
  • Mixed types (e.g., numbers and text)
  • Python objects (less common)

So if you see dtype(‘O’) for a column, it usually means that column contains text or string values.

To convert Release date to year

performers_albums['Release year']=performers_albums['Release date'].dt.year
performers_albums['Release year']
0       1957
1       1958
2       1959
3       1960
4       1961
        ... 
4846    2000
4847    2001
4848    2003
4849    2005
4850    2007
Name: Release year, Length: 4851, dtype: int32

Step 2. Aggregate data and plot

Group the album titles by Release year and then count album_title and set .plot(kind=‘line’).

performers_albums.groupby('Release year')['album_title'].count().plot(kind='line', xlabel='', title='Line Chart')

░ Scatterplot

Scatterplots are useful for exploring relationships between two or more numerical variables. In Pandas, you can create a scatterplot using the .plot() method by specifying the x and y keyword arguments.

Exercise 2

Use the Pandas scatterplot documentation to create a scatterplot that visualizes the relationship between the Peak US and Peak UK chart positions for albums released by your favorite artist inducted into the Rock and Roll Hall of Fame.

BONUS: reverse the x and y axis.

carmen_ohio=open('carmen_ohio.txt', mode='r', encoding='utf-8').read()

# first 250 characters
#Step 1. Create and apply a filter for your favorite artist
favorite_artist=performers_albums['artist']=='Kate Bush'
kate_bush=performers_albums[favorite_artist]

#Step 2. Plot the x and y axis
kate_bush.plot.scatter(x='peakUS',y='peakUK')
#BONUS
plt.gca().invert_xaxis()
plt.gca().invert_yaxis()

PLOTLY

The Plotly Open Source Graphing Library for Python is a robust and versatile Python library that offers over 40 types of interactive data visualizations—from basic bar and bubble charts to advanced 3D scatter and 3D surface plots. Plotly charts are fully interactive, enabling users to zoom, pan, hover for tooltips, and export visuals directly from the browser.

Before using Plotly, be sure to follow the installation instructions provided in the official guide: Getting Started with Plotly in Python

Let’s use plotly to build a bar, line, and scatterplot using the performers_albums DataFrame.

📊 Bar Chart

Build a Bar Chart showing the maximum U.S. peak chart position for any album released by 2025 Rock & Roll Hall of Fame inductees.

import plotly.io as pio
pio.renderers.default="plotly_mimetype+notebook_connected" # This statement is needed to display Plotly in html

#We already have a filtered DataFrame for the 2025 inductees. Since the highest chart position is 1, we need to tell Pandas to find the minimum peakUS chart position for each artist.
agg_performers_albums_filtered=performers_albums_filtered.groupby('artist', as_index=False)['peakUS'].min()

#Now we build our bar chart
fig=pio.bar(agg_performers_albums_filtered, x='artist', y='peakUS', title="Peak US Chart Position: 2025 Rock n Roll Hall of Fame Inductees", color='artist')
fig.show()

📈 Line chart

Create a line chart that shows the total number of albums released each year by all artists in the performers_albums dataset.

#We already converted 'Release date' to year in the code above. Now we tell Pandas to count the number of occurrences of each 'Year' using the .size() method

agg_for_line_performers_albums=performers_albums.groupby(['Release year'], as_index=False).size()

#Build the chart
fig2=pio.line(agg_for_line_performers_albums, x='Release year', y='size', title="Line Chart", markers=False)
fig2.show()

▒ Scatterplot

Create a scatterplot that visualizes the relationship between the Peak US and Peak UK chart positions for albums released by your favorite artist inducted into the Rock and Roll Hall of Fame.

#We already filtered the DataFrame for our favorite artist. Use this DataFrame to build the chart.
fig3=pio.scatter(kate_bush, x='peakUS', y='peakUK', color='album_title', title="Kate Bush Albums: Peak US vs. UK Chart Positions")

fig3.show()

SEABORN

Built on top of Matplotlib and seamlessly integrated with Pandas, the Seaborn library enhances the visual appeal of Python charts with minimal effort. Featuring built-in themes, concise syntax, and a rich gallery of customizable examples, Seaborn helps you create polisthed, publication-quality visualizations quickly and effectively.

📊 Bar Chart

Build a Bar Chart showing the maximum U.S. peak chart position for any album released by 2025 Rock & Roll Hall of Fame inductees.

# INSERT CODE HERE
import seaborn as sns

#Create bar chart
sns.set(style="whitegrid")
plt.figure(figsize=(10,6))
sns.barplot(x='artist', y='peakUS', data=agg_performers_albums_filtered)

# Add title and labels
plt.title("Peak US Chart Position \n 2025 Rock n Roll Hall of Fame Inductees")
plt.xlabel("")
plt.ylabel("Peak US chart position")
Text(0, 0.5, 'Peak US chart position')

📈 Line chart

Create a line chart that shows the total number of albums released each year by all artists in the performers_albums dataset.

Using the Seaborn library’s Emphasizing continuity with line plots tutorial with the seaborn.lineplot API documentation, create a line chart that shows the total number of albums released each year by all artists in the performers_albums dataset.

Format the chart to:

  • Remove gridlines and borders
  • Adjust the x and y axis labels
  • Name the chart

#Line chart

#remove gridlines
sns.set(style="white")

#despine (i.e. remove borders)
sns.despine(top=True, right=True, left=True, bottom=True)

#Group and count number of release years using Pandas .size() method. Reset the index and name the reset index 'count'
released_albums_by_year=performers_albums.groupby(['Release year']).size().reset_index(name='count')

#Build chart
sns.lineplot(data=released_albums_by_year, x='Release year', y='count')

plt.title("Line Chart")
plt.xlabel("")
plt.ylabel("# albums released")

▒ Scatterplot

Using the Seaborn library’s Visualizing statistical relationships tutorial with the seaborn.scatterplot API documentation, create a scatterplot that visualizes the relationship between the Peak US and Peak UK chart positions for albums released by your favorite artist inducted into the Rock and Roll Hall of Fame.

BONUS: reverse the x and y axis.

#We already filtered the DataFrame for our favorite artist. Use this DataFrame to build the chart.

sns.relplot(data=kate_bush, x='peakUS', y='peakUK', hue='album_title')

Check out the Controlling figure aesthetics and Choosing color palettes tutorials to learn how to customize theme and fine-tune the appearance of your Seaborn visualizations.

Footnotes

  1. Visit the Websites and APIs. Lesson 3. Wikipedia tutorial to learn how to extract tables from HTML using pandas.read_html.See the Websites and APIs. Lesson 4. iCite tutorial and Websites and APIs. Lesson 7. Crossref tutorial to learn how to use APIs to gather data.↩︎